familypediawikiaorg-20200214-history
Familypedia:Info pages/technical guidelines
Note that much of this page is probably redundant since the replacement of the info page system. See Form:Person. Technical rationale for info pages The beauty of wikis is that users are not constrained by the arbitrary limits imposed by structured databases. The problem with wikis for genealogy sites is that genealogy data is inherently structured and voluminous, making manual wiki techniques problematic for managing "databasey" problems like prevention of duplicate, inconsistent, and poorly encoded data. *For a treatment of how info pages play a part in addressing identification of duplicates, see Tech Issue:Redundant articles. New key names First, identify the practical benefit of encoding the data on an info page. If for example the information will likely only be used in one place, or can be reliably inferred from the wikitext, then the item should not be in an info page. For example, if there is a wikipedia article on the figure, what benefit is there of encoding it in the info page? Using Template:AlsoWP achieves the same result, uses fewer server resources, and if at some far future date an inferencing program wants to know what en article corresponds to the genealogy article, refering to the AlsoWP link is sufficient- they don't need an explicit declaration in the Key naming: Strictly follow- initial first letter Cap, all others lower. No exceptions. Rationale for use of Caps: Names should be predictable. The convention used to be all lower caps, but Initial cap allow the keys to be picked out more easily in wikitext code. Some acronyms look unnatural with initial caps, eg Guid. Tough. Most folks have not seen GUID and so won't even notice. Languages Info pages are not to be burdened with general translation dictionary tasks. For example, there should not be a "Sex" key for every language. Instead, use a switch statement based on the value equaling male or female, or a template that gives what the 9th month of the year is in the local language. If the value is a value for an entity's attribute is unique to the individual, that value should have a key value if the language is being used. Do not create keys for such translated values if you cannot honestly imagine anyone accessing the value in the coming 2 years. Data model We are basically following XML Gedcom (aka Gedcom 6.0)'s data model that describes entities and their attributes. The "Event" entity is omitted for now since it is not clear that typical genealogy users would benefit from creation of such detail. *Unique identifier :GUIDs created by Guidgenerator. *Referencing- Article names are used. This does not necessarily lead to violations of the logical structure, since wikimedia has robust aliasing support. For example, moved articles leave behind a forward link allowing the info page lookups to work properly. (Caution: double and triple etc redirects don't work; but a bot can correct them whenever desired.) Entities *Individual- this is a person article. **Attributes- see Genealogy:Info pages *Place (theoretical)- used for **Example attributes: *** Longitude/ Latitude *** Parent administrative entity (Seattle is part of King County, part of Washington state, part of US). We use them for category hierarchy in a fairly rigid way (though it varies from country to country. However, most pagenames of places (e.g. Seattle, Washington) do not have the full hierarchy, so we must exercise caution with them, possibly asking users to paste the name in from the correct page. Implementation *The idea of the template originated with wikipedia:User:Geometry guy/Persondata. This version does not include the table due to the concern that for the vast majority of template calls, a large volume of "dead" data would be fetched needlessly. There also may be performance penalties on pages with large numbers of data fetches. The cost of doing removing the integrated table is that there is a need to refresh the page a second time to see the changes made to the info table. So flush twice. *Persondata and hCard fields have been hacked over and may not be functional/ canonical as of the time of this edit. This will be cleaned up in due course. *Some spooky behavior was observed when large numbers of calls were made on a sheet. The evaluator has an arbitrary limit placed on the number of transclusions it will do per page as a defense against DoS attacks. See pre-expand limit below. *For efficiency, it may be desired to bypass helper templates and access values as directly as possible. Given the article, King Elvis, the wikitext to access death month directly would be key=Death month Template implementation issues pre-expand limit Yes, I am aware of the issue. As of Oct 8, 2007, the goal has been prototype only without regard to template efficiency. Some of the inefficiency is due to validation queries, and others are due to a reluctance to make data entry harder (eg- forcing the user to pre-declare how many interwikis there are rather than querying all of them). Client side form code in javascript could relieve this problem somewhat, but then I introduce a barrier to entry for novices, requiring a more complex installation before users can get started. A variety of strategies for dealing with the limits are outlined in wikipedia article: Wikipedia:Template limits. Why do the templates have such odd / objectionable code? # Pre expand limit technique 1: You will see an #if statement which when resolved will result in either a template name or an 'x0' This is done because (as described in wikipedia article referenced above) any templates inside of parserfunctions are added to the pre-expand number. However, if they use this technique, they do not add to the pre-expand number. # Pre expand limit technique 2: All info templates now place documentation on a subpage and are expanded inside of noinclude statements. As explained in the wp article, even though text is inside of noincludes, this adds to the pre-expand limit. Moving them to /doc pages saves over 200K in pre-expand space even on simple pages. # Pre expand limit technique 3: If we support large families with 25 children, that means we have to be super efficient with the family lists. Due to this, the code in the showinfo children template is unavoidably obtuse. First, we nest the conditionals, so that child4 field does not have to be evaluated if child3 is blank. A child count was not necessary since it gives you nothing more than the check for existence of ChildN +1. # Pre expand limit technique 4: Lists with interwikis are collapsed into a single field- this makes multiple checks for existence of field unnecessary, and dealing with the fact that counters are the ChildN+1 technique are irrelevant to the nonsequential interwiki problem. The downside of this is that it is not possible to access individual entries in the list (though possible if the templates had a display flag - eg }}}}}} }}}}}} Then, when you evaluate this list within a template with pl parameter = yes, and d=no then since display is set to no, the default is that all will not return a value. So fr-wk will not. But pl-wk will have return the value, because its pl parameter did not use the default, but the 'y' value passed it. Ideas for new engine stuff *wikimedia proposal persistent subst link. The function is to do a full subst, caching the results of the evaluation on the local page at the time of the last save. But the syntax of the persistent subst remains so that the cached version will be refreshed on the next save. *Mirror the sql data to a tools server as with duesentrieb's stuff so that we can do tools like generation of family trees, and long descendancy lists. *Investigate use of javascript to do forms. EG: edit the info page with helper functions that enter date fields, provide counters for interwikis (performance issue- see pre-expand limit above), or help translate values (attempt to parse text dates, suggest interwikis by querying en.wp page for interwikis). Technical puzzles *There is a general problem of coarse data vs fine data. A field like Marriage is coarse data, and there are others like it: Baptism, Occupation, and so on. For each of these, we would like to know the fine data too if the contributor can provide it. EG: Marriage city Marriage county, Marriage state, Marriage Year and so on. We basically would like to have these options for everything, but this may severely burden the current scheme. **Rather than attempt to nest another info page to do this, we can use brief markers eg: |Marriage = |mYr=1904|mMth=12|mDay=25 |mCity=Seattle |mCount=King |mState=Washington **Until a Javascript /php data entry form is created, users will likely just input the Marriage value. A Bot pass will attempt to convert to date/place, but undoubtedly make errors. However, the original Marriage value will be retained for comparison (though ignored if the mYr etc values are present). category:Info objects documentation